-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat: add spreadsheet mime types to load_artifact_tool #4429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @n-issei-777, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great enhancement, adding support for parsing common spreadsheet formats (.xlsx, .xls, .ods) into Markdown within the load_artifact_tool. The implementation using pandas is a solid approach. My feedback focuses on a few areas for improvement: managing the new heavy dependencies by making them optional, refining the implementation for better clarity and error handling, and significantly strengthening the unit tests to cover all newly supported file types and improve assertion robustness.
| async def test_load_artifacts_parses_spreadsheet(): | ||
| """Spreadsheet artifacts are parsed into markdown.""" | ||
| artifact_name = 'test.xlsx' | ||
|
|
||
| import pandas as pd | ||
| import io | ||
|
|
||
| df = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']}) | ||
| output = io.BytesIO() | ||
| # Use openpyxl as engine since it is in deps | ||
| with pd.ExcelWriter(output, engine='openpyxl') as writer: | ||
| df.to_excel(writer, sheet_name='Sheet1', index=False) | ||
| xlsx_bytes = output.getvalue() | ||
|
|
||
| artifact = types.Part( | ||
| inline_data=types.Blob( | ||
| data=xlsx_bytes, | ||
| mime_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' | ||
| ) | ||
| ) | ||
|
|
||
| tool_context = _StubToolContext({artifact_name: artifact}) | ||
| llm_request = LlmRequest( | ||
| contents=[ | ||
| types.Content( | ||
| role='user', | ||
| parts=[ | ||
| types.Part( | ||
| function_response=types.FunctionResponse( | ||
| name='load_artifacts', | ||
| response={'artifact_names': [artifact_name]}, | ||
| ) | ||
| ) | ||
| ], | ||
| ) | ||
| ] | ||
| ) | ||
|
|
||
| await load_artifacts_tool.process_llm_request( | ||
| tool_context=tool_context, llm_request=llm_request | ||
| ) | ||
|
|
||
| artifact_part = llm_request.contents[-1].parts[1] | ||
| assert artifact_part.inline_data is None | ||
|
|
||
| # Check for Markdown table content | ||
| # We expect something like: | ||
| # ### Sheet: Sheet1 | ||
| # | ||
| # | col1 | col2 | | ||
| # | :--- | :--- | | ||
| # | 1 | a | | ||
| # | 2 | b | | ||
|
|
||
| assert "Sheet1" in artifact_part.text | ||
| assert "col1" in artifact_part.text | ||
| assert "col2" in artifact_part.text | ||
| assert "1" in artifact_part.text | ||
| assert "a" in artifact_part.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test only validates parsing for .xlsx files. Since this change also adds support for .xls and .ods files, it's crucial to add test cases for these formats as well. This will ensure that the required dependencies (xlrd, odfpy) are correctly integrated and will prevent future regressions. Consider using pytest.mark.parametrize to test all supported spreadsheet types with a single test function.
Delete mime_type parameter within the _parse_spreadsheet function. Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Hi @n-issei-777 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
Problem:
Currently, load_artifact_tool does not support common spreadsheet MIME types such as .xlsx, .xls, and .ods. As discussed in the community discussion #3450, when users upload these files via the ADK Web, the tool fails to load them. This forces users to implement redundant custom parsing logic in before_agent_callback to handle these files manually.
Solution:
I have added the following spreadsheet MIME types to _SPREADSHEET_MIME_TYPES to ensure they are correctly identified:
With this change, the tool can now successfully read these files and convert their content into Markdown format. This allows the LLM to process spreadsheet data more effectively. Furthermore, it is capable of handling files with multiple sheets, converting each sheet into its own Markdown table.
Testing Plan
I verified the fix using a local test agent (agent.py) that utilizes load_artifact_tool.
Uploaded .xlsx, .xls and, .ods files and confirmed it is now correctly recognized by the tool.
Verified that the agent can access the artifact data without needing manual parsing in callbacks.
Unit Tests:
Manual End-to-End (E2E) Tests:



Each file content can now be read by load_artifact_tool as shown below. The tool can successfully process and load data from all available sheets when multiple sheets exist.
.xlsx file
.xls file
.ods file*
One of the test file is below.d
test_ss.xlsx
Checklist
Additional context
Without this fix, load_artifact_tool cannot read the contents of uploaded .xlsx, .xls, or .ods files.
